35 research outputs found

    Contextual equivalence in lambda-calculi extended with letrec and with a parametric polymorphic type system

    Get PDF
    This paper describes a method to treat contextual equivalence in polymorphically typed lambda-calculi, and also how to transfer equivalences from the untyped versions of lambda-calculi to their typed variant, where our specific calculus has letrec, recursive types and is nondeterministic. An addition of a type label to every subexpression is all that is needed, together with some natural constraints for the consistency of the type labels and well-scopedness of expressions. One result is that an elementary but typed notion of program transformation is obtained and that untyped contextual equivalences also hold in the typed calculus as long as the expressions are well-typed. In order to have a nice interaction between reduction and typing, some reduction rules have to be accompanied with a type modification by generalizing or instantiating types

    Word Discovery in Visually Grounded, Self-Supervised Speech Models

    Full text link
    We present a method for visually-grounded spoken term discovery. After training either a HuBERT or wav2vec2.0 model to associate spoken captions with natural images, we show that powerful word segmentation and clustering capability emerges within the model's self-attention heads. Our experiments reveal that this ability is not present to nearly the same extent in the base HuBERT and wav2vec2.0 models, suggesting that the visual grounding task is a crucial component of the word discovery capability we observe. We also evaluate our method on the Buckeye word segmentation and ZeroSpeech spoken term discovery tasks, where we outperform all currently published methods on several metrics.Comment: submitted to Interspeech 202

    Phoneme Segmentation Using Self-Supervised Speech Models

    Full text link
    We apply transfer learning to the task of phoneme segmentation and demonstrate the utility of representations learned in self-supervised pre-training for the task. Our model extends transformer-style encoders with strategically placed convolutions that manipulate features learned in pre-training. Using the TIMIT and Buckeye corpora we train and test the model in the supervised and unsupervised settings. The latter case is accomplished by furnishing a noisy label-set with the predictions of a separate model, it having been trained in an unsupervised fashion. Results indicate our model eclipses previous state-of-the-art performance in both settings and on both datasets. Finally, following observations during published code review and attempts to reproduce past segmentation results, we find a need to disambiguate the definition and implementation of widely-used evaluation metrics. We resolve this ambiguity by delineating two distinct evaluation schemes and describing their nuances.Comment: Accepted to SLT 202
    corecore